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In  the  recent  years,  by  rapid  growth  of  wind  power  generation  in  addition  to  its  high  penetration  in 
power  systems,  the  wind  power  prediction  has  been  known  as  an  important  research  issue.  Wind  power 
has  a  complicated  dynamic  for  modeling  and  prediction.  In  this  paper,  different  hybrid  prediction  models 
based  on  neural  networks  trained  by  various  optimization  approaches  are  examined  to  forecast  the  wind 
power  time  series  from  Alberta,  Canada.  At  first,  time  series  analysis  is  performed  based  on  recurrence 
plots  and  correlation  analysis  to  select  the  proper  input  sets  for  the  forecasting  models.  Next,  a 
comparative  study  is  carried  out  among  neural  networks  trained  by  imperialist  competitive  algorithm 
(ICA),  genetic  algorithm  (GA),  and  particle  swarm  optimization  approach.  The  simulation  results  are 
representative  of  the  out-performance  of  ICA  in  tuning  the  neural  network  for  wind  power  forecasting. 
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1.  Introduction 

Electrical  power  generation  is  increased  caused  by  population 
growth  and  its  subsequent  aggressive  electrical  energy  demands  [1]. 
Thermal  pollution  is  increased  and  greenhouse  gases  are  produced 
more,  due  to  the  growth  of  electrical  energy  generation  resulting 
from  thermal  power  plants.  It  causes  more  interest  in  power 
generation  based  on  renewable  energies  [2  .  Electrical  power  gen¬ 
eration  based  on  wind  energy  has  been  fastest  growing  among  the 
renewable  energy  sources  [3].  It  is  estimated  that  in  2020,  about  12% 
of  the  world  electrical  energy  will  be  supplied  from  wind  energy  [4]. 
Therefore,  the  electricity  generated  by  wind  power  will  play  an 
important  role  in  electricity  supply. 

Wind  power  depends  on  weather  conditions  such  as  wind 
speed,  wind  direction,  temperature,  air  pressure  and  environmen¬ 
tal  obstacles.  As  a  dynamic  system,  wind  power  has  a  correlation 
with  its  past  values  at  any  time,  as  well  [3  .  Due  to  the  dependence 
of  wind  power  on  the  atmospheric  parameters,  it  has  been 
recognized  as  a  non-dispatchable  source  [5  .  This  feature  intro¬ 
duces  a  wind  power  as  an  uncertain  variable  and  reduces  the 
system  reliability.  Therefore,  an  accurate  prediction  of  wind  power 
variations  can  moderate  this  problem  to  some  extent  [6,7]. 

Wind  power  prediction  based  on  meteorological  variables  is 
encountered  with  some  difficulties.  That  is,  sufficiently  accurate 
measurements  of  meteorological  variables  are  commonly  unavail¬ 
able  and  their  measurement  equipments  are  so  expensive  to  be 
supported,  elsewhere.  Inaccurate  measurements  or  estimations 
can,  on  the  other  hand,  results  in  aggressive  errors  in  the  wind 
power  forecasting.  As  another  fact,  the  true  model  of  the  wind 
power  generating  unit  is  not  in  hand,  commonly.  Therefore, 
achieving  a  low  wind  power  forecasting  error  via  a  relatively 
simple  black-box  model  with  a  low  number  of  measurable  inputs / 
input  variables  is  perfectly  desired. 

Based  on  the  above  discussion,  in  this  paper,  wind  power 
forecasting  based  on  its  historical  data  as  the  forecasting  model 
inputs  is  considered.  That  is,  the  optimal  training  of  neural 
networks  is  proposed  as  our  modeling  approach  and  four  seasonal 
wind  power  data  sets  of  Alberta,  Canada  [8]  wind  farm  are  studied 
as  the  real  data  for  model  construction  and  evaluation.  In  order  to 
construct  the  neural  network  model  for  forecasting  of  the  wind 
power,  at  first,  time  series  analysis  is  performed  based  on 
recurrence  plots  and  correlation  analysis  to  the  available  wind 
power  time  series.  In  the  next  stage,  a  comparative  study  is  carried 
out  among  various  neural  networks  trained  by  imperialist  compe¬ 
titive  algorithm  (ICA)  [9],  genetic  algorithm  (GA)  [10],  and  particle 
swarm  optimization  (PSO)  [11,12]  approach.  The  simulation  results 
are  representative  of  out-performance  of  ICA  in  tuning  the  neural 
network  for  wind  power  forecasting. 

This  paper  is  organized  as  follows.  In  Section  2,  the  related 
researches  are  introduced.  In  Section  3,  the  data  properties  and  the 
input  selection  approach  is  described.  In  Section  4,  the  proposed 
wind  power  prediction  engine  is  presented.  In  Section  5,  design 
and  evaluation  of  the  forecasting  models  for  the  wind  power  time 
series  of  Alberta,  Canada  are  described.  Finally,  Section  5  concludes 
the  paper. 

2.  The  related  researches 

Wind  power  forecasting  methods  can  be  categorized  as  the 
physical  and  time  series  or  statistical  models  [13,14].  In  the 
physical  modeling,  someone  tries  to  estimate  the  wind  speed  time 
series  taking  into  account  the  physical  characteristics  of  the 
environment  conditions  [15].  The  statistical  model  is  attempted 
to  find  a  relationship  between  the  parameters  of  the  historical 
data  to  predict  the  future  wind  speed  and  wind  power  [16]. 


Commonly,  physical  models  are  used  for  long-term  prediction 
and  statistical  model  are  used  for  short-term  prediction  [17  . 

In  the  literature,  there  are  different  attempts  for  short-term 
wind  power  forecasting  via  hybrid  time  series  methods.  In  18], 
wind  power  prediction  has  been  done  via  a  composition  of 
modified  hybrid  neural  network  and  enhanced  particle  swarm 
optimization  algorithm.  In  [19],  wavelet  transform  support  vector 
machine  in  conjunction  with  statistic-characteristics  analysis 
has  been  employed  for  short-term  wind  power  prediction.  In 

[20] ,  a  method  has  been  presented  to  improve  the  short-term 
wind  power  prediction  at  a  given  turbine  using  information  from 
numerical  weather  prediction  and  from  multiple  observation 
points.  In  this  paper,  the  prediction  of  wind  power  is  achieved 
in  two  stages;  in  the  first  stage  wind  speed  is  predicted  using  the 
proposed  method.  In  the  second  stage,  the  wind  speed  to  output 
power  conversion  is  accomplished  using  power  curve  model.  In 

[21] ,  a  useful  model  based  on  wavelet  transform,  chaotic  time 
series  and  the  GM  (1,1)  method  has  been  presented  for  wind  farm 
power  forecasting.  A  new  approach  based  on  clustering  has  been 
proposed  in  22]  and  in  [23],  the  ultra-short  term  prediction  of 
wind  power  based  on  chaotic  time  series  has  been  considered. 
Artificial  neural  networks  (ANN)  optimized  by  Tabu  search  algo¬ 
rithm  [24],  hybrid  PSO-ANFIS  approach  [25],  wind  farm  power 
generation  based  on  fuzzy  modeling  [26],  and  a  hybrid  strategy  of 
short  term  wind  power  prediction  based  on  the  physical  strategy 
and  ANN  technique  [27]  have  been  addressed  in  the  literature  as 
well.  Besides,  comprehensive  reviews  about  the  methods  and 
models  of  wind  power  may  be  found  in  [28-30]. 

3.  The  data  properties  and  selection  of  appropriate  input  set 

As  stated  earlier,  in  this  paper,  the  prediction  of  wind  power 
experimental  data  from  Alberta,  Canada  wind  farm  [8]  is  consid¬ 
ered.  The  available  data  are  four  seasonal  data  sets  for  year  2007, 
each  one  containing  1368  hourly  stored  data.  The  wind  power  is 
predicted  using  feed-forward  neural  networks  trained  by  some 
optimization  algorithms  being  ICA,  GA  and  PSO.  In  the  feed¬ 
forward  neural  networks,  the  outputs  at  any  moment  only  depend 
on  the  neural  weights  and  the  input  signals  to  the  neural  network 
at  that  moment.  Therefore,  proper  selection  of  inputs  is  essential 
to  obtain  good  performance  of  the  trained  neural  network.  To  do 
that,  in  this  paper,  two  stages  are  followed  to  determine  the  neural 
network  inputs  for  each  seasonal  data  set.  At  the  first  stage,  the 
characteristics  and  predictability  of  the  wind  power  time  series  is 
investigated  via  recurrence  plots.  Based  on  the  derived  results,  in 
the  next  stage,  the  correlation  analysis  is  performed  to  choose 
proper  input  sets  for  the  four  seasonal  data  sets. 

3.2.  The  available  data  and  its  properties 

Seeking  for  the  proper  inputs  for  our  models,  in  this  section  the 
experimental  data  from  Alberta,  Canada  wind  farm  [8]  for  year 
2007  will  be  examined,  closely.  As  mentioned  earlier,  the  available 
data  are  four  seasonal  data  sets,  each  one  containing  1368  hourly 
stored  data.  The  mentioned  data  have  been  shown  in  Fig.  l(a)-(d). 
As  shown  in  this  figures,  severe  fluctuations  is  observed  in  the 
wind  power  time  series  while  no  hallmark  of  strong  periodicity  is 
demonstrated.  However,  such  fluctuations  may  be  due  to  the 
chaotic  or  stochastic  nature  of  a  nonlinear  process  [31-33].  Since, 
we  are  interested  in  predictability,  it  is  important  for  us  to 
distinguish  between  these  two  types  of  processes.  This  property 
has  been  closely  examined  by  the  authors  in  [34]  via  time  series 
analysis  methods,  where  the  results  are  representative  of  stochas¬ 
tic  nature  and  so  short-term  predictability  of  wind  power  time 
series  in  short-term  time  scale.  In  order  for  briefly  representing 
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Fig.  1.  The  Alberta  wind  power  time  series  for  (a)  the  1st  season,  (b)  the  2nd  season,  (c)  the  3rd  season  and  (d)  the  4th  season. 


the  results  in  [34]  and  to  get  a  better  view  about  the  behavior  and 
characteristics  of  the  underlying  system,  the  Recurrence  Plots 
(RPs)  of  wind  power  time  series  are  investigated,  in  this  section. 


3.2.2.  The  fundamentals  of  recurrence  plots 

Recurrence  is  a  fundamental  property  of  dynamic  systems, 
which  can  be  exploited  to  characterize  the  system's  behavior.  As  a 
powerful  tool  for  the  visualization  and  analysis  of  the  phase  space 
trajectory  of  the  experimental  time-series,  recurrence  plot  (RP) 
was  introduced  in  the  late  1980s  by  Eckman  et  al.  [35  .  It  is 
especially  useful  for  finding  hidden  correlations  in  highly  compli¬ 
cated  data  and  to  determine  the  stationarity  of  the  time  series 
[36].  With  RP,  one  can  graphically  detect  hidden  patterns  and 
structural  changes  in  data  or  see  similarities  in  patterns  across  the 
time  series  under  study  [37].  This  technique  has  been  successfully 
applied  to  various  fields,  such  as  physiology  [38,39],  fluid 
dynamics  [40 ],  geology  [41],  economy  [42],  as  well  as  energy 
market  indices  [43-45].  In  this  paper,  the  RP  methodology  will  be 
applied  to  analyze  the  wind  speed  time  series  behavior.  Especially 
the  predictability  of  the  wind  time  series  would  be  investigated  via 
these  analyses. 

For  deriving  an  RP,  first  of  all  the  phase  space  of  signal  must  be 
reconstructed  via  say  “method  of  delays  [46]”.  RPs  visualize  the 
behavior  of  trajectories  in  phase  space  [36,47,48]  via  a  graphical 
representation  of  the  matrix: 


Rij  =  <90- 


where,  x  f  stands  for  the  point  in  the  reconstructed  phase  space  at 
time  i,  and  e  is  a  predefined  threshold  and  <9(.)  is  the  Heaviside 
function.  One  assigns  a  “black”  dot  to  the  value  one  and  a  “white” 
dot  to  the  value  zero.  The  two-dimensional  graphical  representa¬ 
tion  of  Rij  then  is  called  RP  47]  and  can  be  used  to  distinguish 
between  different  dynamic  systems.  In  this  context,  recurrence 
plot  (RP)  examines  the  paths  in  the  state  space.  Three  types  of 
systems  are  recognized  based  on  the  obtained  curve:  (1)  Periodic 
systems,  (2)  Stochastic  systems  and  (3)  Chaotic  systems  [47,48]. 
Periodic  systems  are  marked  by  parallel  lines  and  non-interrupted 


Table  1 

The  embedding  delay  and  embedding  dimension  for  the  Alberta  wind  power  time 
series. 


Season  # 

1 

2 

3 

4 

Embedding  delay 

5 

7 

7 

6 

Embedding  dimension 

20 

14 

16 

16 

diametric,  where  distance  between  the  lines  is  proportional  to  the 
period.  These  diametric  lines  are  also  seen  in  chaotic  systems,  but 
the  lines  have  been  cut  and  their  length  is  shorter.  Also,  the 
distance  between  these  lines  is  irregular.  The  lengths  of  lines  are 
proportion  to  the  degree  of  system  predictability.  RP  curves  of 
uncorrelated  stochastic  systems  consist  of  many  individual  dots 
that  their  distribution  is  quite  irregular  [47,48]. 


3.2.2.  The  recurrence  plot  analysis  results 

In  order  to  reconstruct  the  phase  space  of  the  wind  time  series, 
initially,  the  embedding  delay  and  the  embedding  dimension  of 
the  time  series  must  be  acquired.  The  mutual  information  method 
[46],  and  the  false  nearest  neighbors  method  have  been  used  to 
calculate  the  embedding  delay  and  embedding  dimension  of  the 
fours  seasonal  wind  power  time  series.  These  embedding  time 
delays  and  embedding  dimensions  are  expressed  at  Table  1.  The 
RP  will  be  achieved  by  using  the  dimensions  and  delays  of  these 
time  series. 

The  RPs  of  wind  power  time  series  are  shown  in  Fig.  2(a)-(d) 
plotted  via  the  CRP  toolbox  of  MATLAB  [49]  as  our  tool.  Concerning 
these  figures  it  is  concluded  that  for  the  first  and  second  seasons, 
the  short  term  erratic  distribution  of  recurrence  points  is  repre¬ 
sentative  of  strong  stochastic  nature  of  the  underlying  time  series 
with  mimic  predictability.  The  situation  is  somehow  different  in 
seasons  3  and  4,  where  the  recurrence  diagonals  are  longer  and 
thus,  the  predictability  would  be  increased.  White  ribbons  in  the 
recurrence  plots  correspond  to  transitions  in  the  system  dynamics. 
Such  dynamic  transitions  as  well  as  various  seasonal  properties 
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Fig.  2.  The  RP  of  the  Alberta  wind  power  time  series  for  (a)  the  1st  season,  (b)  the  2nd  season,  (c)  the  3rd  season  and  (d)  the  4th  season. 


are  representative  of  the  seasonality  as  well  as  non-stationarity  of 
the  wind  power  time  series.  Therefore,  in  selecting  the  inputs  for 
the  forecasting  model,  the  mimic  predictability  of  the  wind  time 
series  should  be  taken  into  account.  Besides,  since  the  forecasting 
model  inputs  are  the  lagged  wind  power  terms,  they  should  be  as 
close  as  possible  to  the  desired  time  to  compensate  for  the  non- 
stationarity  of  the  dynamics. 

Based  on  the  above  discussions,  in  the  following  section,  the 
correlation  analysis  is  carried  out  to  select  the  appreciate  inputs 
for  the  forecasting  model. 

32.  Correlation  analysis 

Once  the  mimic  predictability  of  interested  wind  time  series  is 
concluded,  we  should  analyze  the  correlation  properties  the 
available  data  to  choose  the  proper  model  inputs.  The  plots  in 
Fig.  3(a)-(d)  show  the  autocorrelation  function  plot  of  the  seasonal 
wind  power  data  sets. 

In  these  figures,  it  is  illustrated  that  the  wind  power  in  each 
hour  is  highly  correlated  with  its  lagged  values  in  the  same  day  up 
to  a  few  hours.  For  the  previous  days,  the  correlation  decays 
aggressively,  which  is  another  hallmark  of  mimic  predictability. 
We  adopt  a  threshold  of  0.7  of  correlation  to  select  the  model 


inputs.  This  threshold  corresponds  to  6,  6,  4,  and  6  lagged  values 
for  the  four  seasons,  respectively. 

4.  The  power  prediction  engine 

Regarding  the  high  performance  of  neural  networks  in  model¬ 
ing  of  nonlinear  dynamics,  in  this  paper,  they  have  been  employed 
as  our  modeling  tool  for  wind  power  prediction.  In  this  section,  we 
shortly  review  the  basics  of  neural  networks  and  then  switch  to 
the  developed  models  and  their  performance. 

4.1.  The  fundamentals  of  ANN's 

Neural  networks  are  highly  interconnected  simple  processing 
units  designed  in  a  way  to  model  how  the  human  brain  performs 
particular  task  [50,51].  Each  of  those  units,  called  neurons,  forms  a 
weighted  sum  of  its  inputs,  to  which  a  constant  term  called  bias  is 
added.  This  sum  is  then  passed  through  a  transfer  function:  linear, 
sigmoid  or  hyperbolic  tangent  (Fig.  4(a)). 

In  a  typical  ANN,  the  neurons  are  organized  in  a  way  that 
defines  the  network  architecture.  Networks  with  interconnections 
that  do  not  form  any  loops  are  called  feed-forward.  Recurrent  or 
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Fig.  3  SEQ  Figure  \*  ARABIC.  The  autocorrelation  of  Alberta  wind  power  time  series  for  (a)  the  1st  season,  (b)  the  2nd  season,  (c)  the  3rd  season  and  (d)  the  4th  season. 


a 
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Fig.  4.  (a)  Internal  structure  of  a  neuron  and  (b)  the  structure  of  an  example  three  layer  feed-forward  neural  network. 


non-feed  forward  networks  in  which  there  are  one  or  more  loops 
of  interconnections  are  also  used  for  some  kinds  of  applications 
[52].  Multilayer  perceptrons  (MLP's)  are  the  best  known  and  most 
widely  used  kind  of  ANNs.  In  these  neural  networks,  neurons  are 
arranged  in  layers:  an  input  layer,  one  or  more  hidden  layers  and 
an  output  layer.  The  neurons  in  each  layer  may  share  the  same 
inputs,  but  are  not  connected  to  each  other.  Fig.  4(b)  shows  the 
architecture  of  a  generic  three-layered  feed-forward  neural  net¬ 
work  model.  In  order  to  find  the  optimal  network  architecture,  one 
should  evaluate  several  combinations.  These  combinations  include 
in  networks  with  different  number  of  hidden  layers,  different 
number  of  neurons  in  each  layer  and  different  types  of  transfer 
functions.  Typically,  the  number  of  neurons  in  the  hidden  layer  is 
chosen  by  trial  and  error. 

In  the  feed  forward  neural  networks,  the  output  only  depends 
on  input  signals  and  neural  weights  at  that  moment.  The  activa¬ 
tion  function  used  in  the  hidden  layers  is  commonly  nonlinear 
transfer  functions  such  as  and  log-sigmoid  function  with  its  output 
in  [0, 1  ]  interval,  or  tan-sigmoid  function  for  penning  the  input  to 


the  interval  [-1, 1].  The  output  of  a  hidden  layer  is  compute  as: 

m  =  wnx  1  +  wi2x2  +  . .  .wiRxR  +  bi  (2) 

ai  =f(ni),  i  =  1,2,3,  ...S  (3) 

where,  xR  is  the  Rth  input;  S  is  the  number  of  neurons;  wiR  is  the 
related  weight  of  the  input  vector  and  ith  neuron  of  the  hidden 
layer;  bi  is  its  bias;  and/(.)  is  the  activation  function.  The  output  is 
computed  in  the  output  layer  in  the  same  manner  as  the  hidden 
layers  unless  the  linear  transfer  function  is  commonly  used  in  this 
layer  as  the  activation  function. 

Forecasting  with  neural  networks  involves  two  steps:  training 
and  testing.  Training  of  feed  forward  networks  is  normally 
performed  in  a  supervised  manner.  In  supervised  manner  both 
input  and  outputs  are  participated  in  training  the  network.  The 
adequate  selection  of  inputs  for  neural  network  training  is  highly 
influential  to  the  success  of  training.  A  learning  process  in  the 
neural  network  then  constructs  an  input-output  mapping  by 
adjustment  of  the  weights  and  biases  at  each  iteration  based  on 
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the  minimization  of  some  error  measure  between  the  produced 
and  the  desired  output.  Thus,  learning  entails  an  optimization 
process.  The  knowledge  acquired  by  the  neural  network  through  the 
learning  process  is  tested  by  applying  new  data  that  it  has  never  seen 
before,  called  the  testing  set.  The  network  should  be  able  to  general¬ 
ize  and  have  an  accurate  output  for  this  unseen  data  [43]. 

The  most  common  learning  algorithm  is  the  back  propagation 
algorithm  [53],  in  which  the  error  is  propagated  back  to  the  input 
in  order  for  adjusting  the  weights  and  biases  in  each  layer. 
The  standard  back  propagation  learning  algorithm  is  a  steepest 
descent  algorithm  that  minimizes  the  sum  of  square  errors.  This 
standard  back  propagation  learning  algorithm  is  not  efficient 
numerically  and  tends  to  converge  slowly  [50,53].  An  algorithm 
that  trains  a  neural  network  10-100  times  faster  than  the  usual 
back  propagation  algorithm  is  the  Levenberg-Marquardt 
algorithm.  The  Levenberg-Marquardt  algorithm  is  a  variation  of 
Newton's  method  [50].  Newton's  update  for  minimizing  a  function 
V(x)  with  respect  to  the  input  vector  x ,  is  given  by: 

V(X)  =  1  2  e2h(x)  (4) 

h  =  1 

where,  e(x)  is  the  output  error  vector.  The  details  about  the  Leven¬ 
berg-Marquardt  algorithm  can  be  found  in  [43  .  This  method, 
however,  commonly  suffers  from  lack  of  convergence  to  the  global 
optimum.  Therefore,  employing  a  more  efficient  optimization  algo¬ 
rithm  may  lead  to  more  accurate  response,  less  forecasting  error  as 
well  as  better  convergence.  Based  on  the  above  discussions,  in  the 
following  sections,  some  optimizations  algorithms  being  imperialist 
competitive  algorithm  (ICA),  genetic  algorithm  (GA),  and  particle 
swarm  optimization  (PSO)  approach  are  employed  for  training  the 
neural  network  for  forecasting  the  wind  power  time  series  from 
Alberta,  Canada  and  the  results  are  compared  for  this  case. 

4.2.  The  trainer  unit 

In  this  section  the  optimization  algorithms  employed  for  training 
the  forecasting  neural  network  models  are  briefly  introduced. 

4.2.1.  Imperialist  competitive  algorithm 

Imperialist  competitive  algorithm  (ICA)  is  a  new  optimization 
technique  that  is  inspired  by  imperialism  countries  competing 
social  and  political  processes.  ICA  has  shown  its  outstanding  ability 
for  the  various  problems  [54-57].  This  algorithm  is  initially  started 
with  N  Clooney  in  which,  Nimp  is  the  best  one  (country  with  the 
lowest  cost)  which  is  selected  as  imperialisms.  In  58],  ICA  pseudo¬ 
code  is  described  as  follows: 

i.  Selection  of  the  random  locations  of  the  function  and  initi¬ 
alize  the  empires. 

ii.  Moving  the  colonies  toward  their  related  imperialist  (absorption 
policy  or  assimilation)  according  to  predetermined  assimilation 
coefficient  (/?  >  1 )  and  assimilation  angle  coefficient  (/),  which 
determine  the  angle  and  amount  of  movement. 

iii.  Changing  randomly  the  location  of  colonies  (revolution). 

iv.  Until  the  cost  of  colony  is  less  than  the  imperialist,  it  remains 
in  the  empire  and  changes  its  location  relative  to  imperialist. 

v.  Uniting  the  empires  with  the  same  conditions. 

vi.  Calculating  the  total  cost  of  all  empires  via: 

Totalcostofempire  =  Costofimperialist  +  £ 

x  mean(costofallcolonies)  (5) 

where,  £  is  a  constant  and  mean(.)  stands  for  the  average  of  its 
arguments. 

vii.  Selecting  the  weakest  colony  (colonies)  from  the  weakest 
empires  and  put  it  (them)  in  one  of  the  empires  (colonial 
competition). 


viii.  Destroying  the  weak  empires, 

ix.  If  the  preset  conditions  satisfied,  it  will  stop,  otherwise 
return  to  2. 


4.2.2.  Genetic  algorithm 

A  genetic  algorithm  emulates  biological  evolution  to  solve 
optimization  problems.  It  is  formed  by  a  set  of  individual  elements 
(the  population)  and  a  set  of  biological  inspired  operators  that  can 
change  these  individuals.  According  to  evolutionary  theory  only 
the  individuals  that  are  the  more  suited  in  the  population  are 
likely  to  survive  and  to  generate  off-springs,  thus  transmitting 
their  biological  heredity  to  new  generations. 

In  computing  terms,  genetic  algorithms  map  strings  of  num¬ 
bers  to  each  potential  solution.  Each  solution  becomes  an  indivi¬ 
dual  in  the  population,  and  each  string  becomes  a  representation 
of  an  individual.  There  should  be  a  way  to  derive  each  individual 
from  its  string  representation.  The  genetic  algorithm  then  manip¬ 
ulates  the  most  promising  strings  in  its  search  for  an  improved 
solution.  The  algorithm  operates  through  a  simple  cycle  [10]: 

i.  Creation  of  a  population  of  strings. 

ii.  Evaluation  of  each  string. 

iii.  Selection  of  the  best  strings. 

iv.  Genetic  manipulation  to  create  a  new  population  of  strings. 


4.2.3.  Particle  swarm  optimization  algorithm 

Particle  swarm  optimization  (PSO)  is  a  method  for  performing 
numerical  optimization  without  explicit  knowledge  of  the  gradient 
of  the  problem  to  be  optimized.  PSO  is  originally  attributed  to 
Kennedy,  and  Eberhart  was  first  intended  for  simulating  social 
behavior  12].  The  algorithm  was  simplified  and  it  was  observed  to 
be  performing  optimization.  PSO  is  an  efficient  population  based 
optimization  technique,  which  is  appropriate  for  non-convex 
optimization  problems  [11,12].  In  general,  the  velocity  update  of 
the  ith  particle  at  the  /<+lth  iteration  is  expressed  as  [11]: 

vf+1  =  w  x  v\  +  Ci  xq  x  (pf-Xj)  +  c2  x  r2  x  (pg-x'-)  (6) 

xf+1  =  x\  +  vj<+1  (7) 

where,  in  Eq.  (6),  v\  is  the  velocity  of  the  ith  particle  at  the  fcth 
itertaion,pg  is  the  swarm's  best  known  position,  w  is  the  inertia 
weight,  Ci,c2  are  the  learning  factors,  and  x\  is  the  position  of  the 
ith  particle  at  the  /cth  iteration.  In  Eq.  (6),  the  first  term  provides 
the  necessary  momentum  for  particles  to  roam  across  the  problem 
space.  The  second  is  the  cognitive  component  that  represents  the 
individual  experience  of  each  particle.  The  second  component 
encourages  the  particles  to  move  toward  their  own  best  positions 
reached.  The  last  component  is  the  social  collaboration  of  the 
particles  in  finding  the  global  optimal  solution.  The  particles  are 
pulled  toward  the  global  best  particle  reached.  Finally,  the  position 
of  the  ith  particle  is  updated  by  Eq.  (7)  [11  . 

4.3.  Evaluation  indices 

As  stated  earlier,  for  the  evaluation  of  the  ANN'S  performance,  a 
testing  set  containing  new  input  data  that  it  has  never  seen  before 
is  applied  to  the  trained  network.  The  performance  of  the  trained 
network  is  then  evaluated  by  comparison  of  the  network  output 
with  its  actual  value.  There  are  some  statistical  evaluation  indices 
which  are  commonly  used  to  judge  about  an  ANN'S  performance. 
Let  Ai  and  P,  be  the  actual  and  network  output,  respectively, 
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related  to  ith  input  vector,  where  N  is  the  number  of  points  in  the 
testing  set.  Then  the  evaluation  indices  are  defined  as  [43]: 


Mean  absolute  error  {MAE): 

MAE=lz  |Pj— A'l 
2  =  1 


(8) 


Root  mean  square  error  {RMSE): 


1 


N 


RMSE=-^I  £  (Pi-AiY 


(9) 


Mean  absolute  percentage  error  {MAPE): 


MAPE  = 


l  £  IA-AI 

N, 


i  =  1  A 


X  100% 


(10) 


•  Modified  mean  absolute  percentage  error  (Modified_MAPE):  In 
the  relationship  in  Eq.  (10)  if  the  actual  value  is  large  and  its 
prediction  becomes  small,  the  computed  relative  error  will 
become  near  100%.  On  the  other  hand  if  the  actual  value  is 
small,  the  relative  error  may  become  very  large  even  though 
the  difference  is  small.  In  this  case,  the  relationship  in  Eq.  (10) 
is  modified  in  this  manner.  At  first,  the  average  of  actual  output 
values  is  computed  as: 


1 


N 


AV  N  j  Ai 


and  then,  the  Modified_MAPE  will  be  computed  as  [59]: 
Modified_MAPE  =  1  f]  |P|^'il  x  1 00%  (11) 


Modified  peak  absolute  percentage  error  ( Modified_PAPE ): 
Modified_PAPE  =  Max (AA)  x  1  °°%  for  1  ^  1 N  (12) 


Table  3 

Selected  inputs,  WP2(t),  for  the  Alberta  wind  power  time  series,  2nd  Season. 


Rank 

Selected 

inputs 

Auto¬ 

correlation 

Rank 

Selected 

inputs 

Auto¬ 

correlation 

1 

WP2(t- 1) 

0.973 

4 

WP2(t- 4) 

0.831 

2 

WP2(t- 2) 

0.928 

5 

WP2(t- 5) 

0.781 

3 

WP2(t- 3) 

0.881 

6 

WP2(t- 6) 

0.733 

Table  4 

Selected  inputs,  WP3(t),  for  the  Alberta  wind  power  time  series, 

3rd  season. 

Rank  Selected 

Auto- 

Rank  Selected 

Auto- 

inputs 

correlation 

inputs 

correlation 

1  WP3(t-l) 

0.956 

3  WP3(t- 3) 

0.797 

2  WP3(t- 2) 

0.88 

4^ 

1 

1 

4^ 

0.712 

Table  5 

Selected  inputs,  WP4(t),  for  the  Alberta  wind  power  time  series,  4th  season. 

Rank 

Selected 

inputs 

Auto¬ 

correlation 

Rank 

i 

Selected 

inputs 

Auto¬ 

correlation 

1 

WP4(t- 1) 

0.973 

4 

WP4(t— 4) 

0.831 

2 

WP4(t—2) 

0.929 

5 

WP4(t- 5) 

0.779 

3 

WP4(t- 3) 

0.881 

6 

WP4(t- 6) 

0.726 

time  series  of  season  i.  Based  on  these  results,  the  wind  power  of 
1  to  6  hours  before  the  desired  hour  has  been  considered  as  the 
neural  network  models’  inputs  for  the  first,  second  and  fourth 
seasons,  while  it  drops  to  the  1-4  h  ago  for  the  third  season. 

In  order  to  find  the  optimal  network  input  set,  several  correla¬ 
tion  thresholds  were  evaluated.  Amongst,  the  selected  threshold 
and  so  the  selected  input  set,  in  one  hand,  considers  the  correla¬ 
tion  properties  of  the  available  data  and,  on  the  other  hand, 
implies  a  proper  convergence  rate. 


5.  Design  and  evaluation  of  the  forecasting  models 

5.2.  Input  selection 

The  multi  layer  perceptron  feed  forward  neural  network  with 
two  hidden  layers  is  proposed  in  this  paper  for  short-term 
forecasting  of  the  wind  power  time  series.  Based  on  the  performed 
analyses  and  considering  the  short-term  predictability  of  the  wind 
time  series  as  well  as  its  non-stationarity  and  seasonality,  four 
separate  neural  network  models  has  been  synthesized  in  order  for 
forecasting  the  Alberta  wind  time  series  in  each  season.  In  order  to 
select  the  appreciate  inputs,  a  threshold  of  0.7  has  been  considered 
to  determine  the  correlated  lagged  data  as  the  network  inputs.  The 
lagged  data  and  the  underlying  correlations  have  been  presented 
in  Tables  2-5,  corresponding  to  the  auto-correlation  graphs  shown 
in  Fig.  3.  In  these  tables,  WP/,  i  =  l,..,4,  stands  for  the  wind  power 


Table  2 

Selected  inputs,  for  the  Alberta  wind  power  time  series,  1st  Season. 


Rank 

Selected 

inputs 

Auto¬ 

correlation 

Rank 

Selected 

inputs 

Auto¬ 

correlation 

1 

WP1(t- 1) 

0.976 

4 

WP1(t— 4) 

0.826 

2 

WP-iit—2) 

0.933 

5 

WP-i(t—  5) 

0.766 

3 

WP-ilt—3) 

0.881 

6 

WP-i(t—  6) 

0.706 

5.2.  Network  configuration 

As  stated  earlier,  in  training  an  ANN,  the  number  of  hidden 
layers,  and  the  number  of  the  neurons  of  each  layer  affect  the 
prediction  precision  and  training  rate,  considerably.  Therefore,  in 
order  to  find  the  optimal  network  architecture,  several  combina¬ 
tions  of  inputs  were  evaluated.  These  combinations  included 
networks  with  different  number  of  hidden  layers,  different  num¬ 
ber  of  neurons  in  each  layer  and  different  types  of  transfer 
functions.  We  converged  to  a  configuration  consisting  of  two 
hidden  layers  and  number  of  neurons  as:  6  for  input  layer  for 
the  first,  second  and  fourth  seasons,  and  4  for  the  third  season, 
7  and  5  for  hidden  layers  and  1  for  output  layer.  All  of  the  input 
data  were  normalized  between  - 1  and  1.  Based  on  this  normal¬ 
ization,  the  transfer  function  for  input  and  hidden  layer  neurons 
has  been  selected  as  a  tan-sigmoid  transfer  function,  defined  by: 


The  linear  transfer  function  is  also  used  in  the  neurons  of 
output  layer.  For  training  the  network,  the  neural  network  toolbox 
of  MATLAB  [60]  was  selected  due  to  its  flexibility  and  simplicity 
[9].  The  cost  function  of  Eq.  (4)  is  considered  as  the  training  index, 
and  ICA,  GA,  and  PSO  have  been  employed  to  find  the  optimal 
network  weights  to  minimize  the  cost  function.  The  corresponding 
properties  and  parameters  of  the  optimization  approaches  have 
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been  brought  in  Table  5.  In  order  for  training  of  each  seasonal 
network,  1200  data  of  each  wind  power  data  set  have  been 
considered  for  training  and  168  data  have  been  used  for 
evaluation. 

5.3.  Evaluation  results 

In  this  section,  the  performance  of  the  proposed  prediction 
engine  is  investigated.  That  is  the  performance  of  the  neural 
network  trained  by  ICA,  PSO  and  GA  are  compared  for  wind  power 
prediction.  Figs.  5-8  show  the  results  of  the  trained  neural 
networks  for  the  three  cases.  For  comparison,  the  results  for  the 
method  in  [8]  has  been  brought,  as  well.  Besides,  in  Table  6  the 
validation  indices  i.e.  MAE,  RMSE,  Modified_MAPE  and  Modified_- 
PAPE  for  both  test  and  train  data  have  been  brought.  As  seen  from 
these  results,  the  proposed  prediction  engine  performs  superior 
with  respect  to  the  method  in  8].  Among  the  three  proposed 


Fig.  5.  The  actual  and  forecasted  wind  power  time  series  forecasted  by  hybrid  NN 
for  1st  test  week  of  Alberta. 


Fig.  6.  The  actual  and  forecasted  wind  power  time  series  forecasted  by  hybrid  NN 
for  2nd  test  week  of  Alberta. 


Fig.  7.  The  actual  and  forecasted  wind  power  time  series  forecasted  by  hybrid  NN 
for  3rd  test  week  of  Alberta. 


Fig.  8.  The  actual  and  forecasted  wind  power  time  series  forecasted  by  hybrid  NN 
for  4th  test  week  of  Alberta. 


x  10  * 


Fig.  9.  The  convergence  curve  of  the  proposed  hybrid  neural  networks  for  various 
optimization  methods. 


Table  6 

The  properties  and  parameters  of  the  employed  optimization  algorithms. 


ICA 

PSO 

GA 

Number  of  initial  countries. 

40 

Population  size  (swarm  size) 

200 

Population  size 

100 

Number  of  initial  imperialists. 

8 

Personal  learning  coefficient  (Ci) 

2 

Crossover  percentage 

0.7 

Revolution  rate 

0.3 

Global  learning  coefficient  (c2) 

2 

Mutation  percentage 

0.2 

Assimilation  coefficient  (/?) 

2 

Inertia  weight  damping  ratio  (w) 

0.99 

Assimilation  angle  coefficient  (7) 

0.5 

C 

0.02 
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Table  7 

The  comparison  of  the  evaluation  indices  for  different  hybrid  methods. 


Method 

Test  weeks 

Correlation  of  test  data 

MAE  of  test  data 

RMSE  of  test  data 

Modified_MAPE  of  test  data 

Modified_PAPE  of  test  data 

ICA-NN 

1 

0.99493 

3.4320 

4.2963 

7.3888 

27.6514 

2 

0.98765 

3.5459 

4.5374 

7.8355 

28.9272 

3 

0.97657 

4.7084 

6.6030 

10.9268 

47.3932 

4 

0.98407 

6.9516 

9.5403 

13.2342 

87.6471 

PSO-NN 

1 

0.98136 

5.8490 

7.6116 

12.5770 

59.3311 

2 

0.95684 

5.9023 

7.5083 

13.0523 

43.7229 

3 

0.94692 

6.9452 

9.7647 

16.1152 

77.0529 

4 

0.98296 

7.2965 

10.1025 

13.9086 

87.1824 

GA-NN 

1 

0.98146 

7.0851 

8.5152 

15.2348 

65.3294 

2 

0.95513 

7.0337 

8.5009 

15.5544 

53.9946 

3 

0.93777 

7.4237 

10.8097 

17.2253 

78.37353 

4 

0.98040 

8.2172 

11.1027 

15.6636 

85.7794 

The  method  in  [8] 

1 

0.9791 

6.9984 

9.2971 

15.0669 

87.3696 

2 

0.9319 

7.2740 

9.6440 

16.0733 

66.6019 

3 

0.90547 

8.7586 

12.3679 

20.3263 

108.9452 

4 

0.9624 

7.7107 

13.8326 

15.0669 

109.7564 

hybrid  cases,  the  hybrid  of  ICA  and  NN  shows  the  best  perfor¬ 
mance  with  the  lowest  error  indices.  From  convergence  point  of 
view,  the  methods  have  been  compared  in  Fig.  9.  That  is,  ICA  in 
conjunction  with  neural  network  show  the  fastest  convergence, 
while  the  neural  network  model  trained  by  PSO  is  faster  than  the 
model  trained  by  GA  Table  7. 


6.  Conclusions 

In  this  paper,  accurate  forecasting  of  wind  power,  as  a  key 
requirement  to  acquire  proper  performance  of  a  wind  farm  has 
been  considered.  The  desired  wind  power  to  forecast  are  the  four 
seasonal  wind  power  data  sets  of  Alberta,  Canada  wind  farm 
which  are  studied  as  the  real  data  for  model  construction  and 
evaluation.  In  order  to  synthesize  an  accurate  model  for  wind 
power  prediction,  at  first,  the  wind  power  time  series  behavior  has 
been  characterized  via  a  powerful  time  series  analysis  method 
known  as  recurrence  plot.  Via  this  characterization,  it  is  observed 
that  the  wind  time  series  exhibit  as  stochastic  signal  with  mimic 
predictability.  The  non-stationarity  and  seasonality  of  this  time 
series  are  the  other  characteristics  of  the  wind  power.  Based  on  the 
analysis  results  short-term  forecasting  of  the  wind  time  series  has 
been  considered  via  some  hybrid  optimized  neural  network 
models.  Due  to  the  mimic  predictability  of  the  time  series  the 
close  past  values  of  the  time  series  which  are  highly  correlated 
with  the  hourly  wind  power  time  series  have  been  considered  as 
the  model  inputs.  Such  correlation  analyses  has  lead  to  selection  of 
the  wind  power  at  most  1-6  h  before  the  desired  as  the  neural 
network  models'  inputs.  Next,  the  neural  network  model  has  been 
trained  via  three  powerful  optimization  algorithms  which  are  GA, 
PSO  and  ICA.  The  prediction  results  as  well  as  the  evaluation 
indices  are  representative  of  the  out-performance  of  the  hybrid 
model  of  neural  network  and  ICA  with  respect  to  others.  Low  error 
indices  and  very  fast  convergence  are  the  main  properties  of  the 
hybrid  ICA-neural  network  model. 
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